Parallel Algorithms for Shared-memory Machines. a Standard Wasteful Implementation
نویسنده
چکیده
10 technique enables a simple cost-eeective implementation with little eeort. It was used for the rst time to implement a fast optimal parallel hashing algorithm 7]. The hashing algorithm in 7] comprises two parts: the rst part is a randomized geometric decaying algorithm which runs for O(lg lg n) steps. By using the technique of this paper and the O(lg lg n) time load balancing of 6], this part was implemented optimally with expected additive overhead of O(lg lg n lg n) time, which implies an work-optimal implementation that takes O(lg lg n lg n) time, i.e., using p = n= lg lg n lg n processors. The second part runs in O(lg lg n) time using p processors, implying a work-optimal implementation for the entire algorithm that takes O(lg lg n lg n) expected time. Subsequent to a preliminary version of this paper (presented in 7]), other policies of eeective load balancing for other classes of algorithms were introduced, often using amortization arguments similar to the one used here 17, 9, 10, 5] (see also 16]). 9 in an attempt to achieve an o(lg`) overhead. Suppose that t i is set to satisfy t i x i = ` lg`, and let us ignore the extra execution time imposed on the processors due to imbalance. Then, (6) is replaced by t i = ` lg`=x i but (7) and (8) do not change. It is easy to verify that the overhead is still (lg`). Moreover, it is easy to see that the performance of the above policy (when ignoring the extra execution due to imbalance as above) can match the performance of any other policy of load balancing utilization, up to a constant factor. The policy presented in this paper is therefore optimal. 4 Conclusions This paper introduces a simple eeective policy for employing load balancing algorithms for processor scheduling in geometric-decaying algorithms. The resulting implementation is as simple as the standard implementation with the only diierence being a more careful account of when and how often load balancing should be used. The load balancing algorithm is used as a black box, and the suggested policy is thus applicable to many models of parallel computation. We deened the overhead of policies of invoking load balancing to implement Brent's scheduling principle. The overhead can be used to evaluate and compare such policies. It was shown that the overhead for our …
منابع مشابه
Virtual Shared Memory and Distributed Memory Implementations of Parallel Algorithms for Numerical Integration
Parallel globally adaptive algorithms for numerical integration provide a simple example of algorithms that exploit control parallelism. In this paper we consider the implementation of such algorithms on both virtual shared memory (KSR-1) and distributed memory (iPSC/860) machines and investigate how the characteristics of the diierent architectures aaect the choice of implementation and thereb...
متن کاملPaclib | a System for Parallel Algebraic Computation on Shared Memory Multiprocessors
This paper gives an overview on the structure and the use of Paclib, a new system for parallel algebraic computation on shared memory computers. Paclib has been developed as a professional tool for the simple design and eecient implementation of parallel algorithms in computer algebra and related areas. It provides concurrency, shared memory communication , non-determinism, speculative parallel...
متن کاملAn Efficient Parallel Algorithm for Graph-Based Image Segmentation
Automatically partitioning images into regions (‘segmentation’) is challenging in terms of quality and performance. We propose a Minimum Spanning Tree-based algorithm with a novel graph-cutting heuristic, the usefulness of which is demonstrated by promising results obtained on standard images. In contrast to data-parallel schemes that divide images into independently processed tiles, the algori...
متن کاملParallel Graph Generation Algorithms for Shared and Distributed Memory Machines
In this paper we give an overview and a comparison of two parallel algorithms for the state space generation in stochastic modeling on common classes of multiprocessors In this context state space generation simply means constructing a graph which usually gets extremely large On shared memory machines the key problem for a parallelization is the implementation of a shared data structure which e...
متن کاملWHAT GOOD ARE SHARED-MEMORY MODELS? - Parallel Processing, 1996. Proceedings of the 1996 ICPP Workshop on Challenges for
Shared memory models have been criticized for years for failing to model essential realities of parallel machines. Given the current wave of popular messagepassing and distributed memory models (e.g., BSP, LOGP), it is natural to ask whether shared memory models have outlived any usefulness they may have had. In this invited position papel; we discuss the continuing importance of shared memory ...
متن کاملThe Design of the PACLIB Kernel
This paper describes the runtime kernel of paclib, a new system for parallel algebraic computation on shared memory computers. paclib has been developed as a professional tool for the simple design and eecient implementation of parallel algorithms in computer algebra and related areas. It provides concurrency, shared memory communication, non-determinism, speculative parallelism, streams and pi...
متن کامل